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Abstract 

£> . This paper develops maximum score estimation of preference parameters 

in the binary choice model under uncertainty in which the decision rule is 
r-~«. ' affected by conditional expectations. The preference parameters are estimated 

l/~) ■ in two stages: we estimate conditional expectations nonparametrically in the 

t^J" \ first stage and then the preference parameters in the second stage based on 

Manski (1975, 1985) 's maximum score estimator using the choice data and 
first stage estimates. The paper establishes consistency and derives the rate of 
convergence of the corresponding two-stage estimator, which is of independent 
interest for maximum score estimation with generated regressors. The paper 
rS ■ also provides results of some Monte Carlo experiments. 

Keywords: discrete choice, maximum score estimation, generated regressor, 
preference parameters, M-estimation, cube root asymptotics 
JEL Codes: C12, C13, C14. 



*This work was in part supported by European Research Council (ERC-2009-StG-240910- 
ROMETA) and by the National Research Foundation of Korea Grant funded by the Korean Govern- 
ment (NRF-2012S1A5A8023573). We would like to thank a co-editor and two anonymous referees 
for helpful comments. 

'''Corresponding author. Address: School of Economics, Hongik University, 94 Wausan-ro, Mapo- 
Gu, Seoul, South Korea 121-791. E-mail: mjacsung@hongik.ac.kr. 

1 



1 Introduction 

This paper develops a semiparametric two-stage estimator of preference parameters 
in the binary choice model where the agent's decision rule is affected by conditional 
expectations of outcomes which are uncertain at the choice making stage and the pref- 
erence shocks are nonparametrically distributed with unknown form of heteroskedas- 
ticity. The pioneering papers of Manski (1991, 1993) establish nonparametric identi- 
fication of agents' expectations in the discrete choice model under uncertainty when 
the expectations are fulfilled and conditioned only on observable variables. Utiliz- 
ing this result, Ahn and Manski (1993) proposed a two-stage estimator for a binary 
choice model under uncertainty where agent's utility was linear in parameter and the 
unobserved preference shock had a known distribution. Specifically, Ahn and Manski 
(1993) estimated the agent's expectations nonparametrically in the first stage and 
then the preference parameters in the second stage by maximum likelihood estima- 
tion using the choice data and the expectation estimates. Ahn (1995, 1997) extended 
the two-step approach further. On one hand, Ahn (1995) considered nonparamet- 
ric estimation of conditional choice probabilities in the second stage. On the other 
hand, Ahn (1997) retained the linear index structure of the Ahn-Manski model but 
estimated the preference parameters in the second stage using average derivative 
method hence allowing for unknown distribution of the unobservable. In principle, 
alternative approaches accounting for nonparametric unobserved preference shock 
can also be applied in the second step estimation of this framework. Well known 
methods include Cosslett (1983), Powell et al. (1989), Ichimura (1993), Klein and 
Spady (1993), and Coppejans (2001), among many others. 

The aforementioned papers allow for nonparametric setting of the distribution of 
the preference shock. But the unobserved shock is assumed either to be indepen- 
dent of or to have specific dependence structure with the covariates. By contrast, 
Manski (1975, 1985) considered a binary choice model under the conditional median 
restriction and thus allowed for general form of heteroskedasticity for the unobserved 
shock. It is particularly important, as shown in Brown and Walker (1989), to ac- 
count for heteroskedasticity in random utility models. Therefore, this paper develops 



the semiparametric two-stage estimation method for the Ahn-Manski model where 
the second stage is based on Manski (1975, 1985)'s maximum score estimator and 
thus can accommodate nonparametric preference shock with unknown form of het- 
eroskedasticity. 

From a methodological perspective, this paper also contributes to the literature 
of two-stage M-estimation method with non-smooth criterion functions. When the 
true parameter value can be formulated as the unique root of certain population mo- 
ment equations, the problem of M-estimation can be reduced to that of Z-estimation. 
Chen et al. (2003) considered semiparametric non-smooth Z-estimation problem with 
estimated nuisance parameter, while allowing for over-identifying restrictions. Chen 
and Pouzo (2009,2012) developed general estimation methods for semiparametric 
and nonparametric conditional moment models with possibly nonsmooth general- 
ized residuals. For the general M-estimation problem, Ichimura and Lee (2010) 
assumed some degree of second-order expansion of the underlying objective function 
and established conditions under which one can obtain a ViV-consistent estimator 
of the finite dimensional parameter where N is the sample size when the nuisance 
parameter at the first stage is estimated at a slower rate. For more recent papers 
on two-step semiparametric estimation, see Ackerberg et al. (2012), Chen et al. 
(2013), Escanciano et al. (2012, 2013), Hahn and Ridder (2013), and Mammen et 
al. (2013), among others. None of the aforementioned papers include the maximum 
score estimation in the second stage estimation. 

For this paper, the second stage maximum score estimation problem cannot be 
reformulated as a Z-estimation problem. Furthermore, even in absence of nuisance 
parameter, Kim and Pollard (1990) demonstrated that the maximum score estimator 
can only have the cube root rate of convergence and its asymptotic distribution is 
non-standard. The most closely related paper is Lee and Pun (2006) who showed 
that m out of n bootstrapping can be used to consistently estimate sampling dis- 
tributions of nonstandard M-estimators with nuisance parameters. Their general 
framework includes the maximum score estimator as a special case, but allowing for 
only parametric nuisance parameters. Therefore, established results in the two-stage 



estimation literature are not immediately applicable and the asymptotic theory devel- 
oped in this paper may also be of independent interest for non-smooth M-estimation 
with nonparametrically generated covariates. 

The rest of the paper is organized as follows. Section 2 sets up the binary choice 
model under uncertainty and presents the two-stage maximum score estimation pro- 
cedure of the preference parameters. Section 3 states regularity assumptions and 
derives consistency and rate of convergence of the estimator. Section 4 presents 
Monte Carlo studies assessing finite sample performance of the estimator. Section 5 
concludes this paper. Proofs of technical results along with some preliminary lemmas 
are given in the Appendices. 

2 Maximum Score Estimation of a Binary Choice 
Model under Uncertainty 

Suppose an agent must choose between two actions denoted by and 1. The utility 
from choosing action j G {0, 1} is 

Realization of the random vector (zj,Ej) G R k x R is known to the agent before the 
action is chosen and the random vector y G R p is realized only after the action is cho- 
sen. Random vectors (zi,£\) and (zo,£o) are not necessarily identical. Distribution 
of y depends on the chosen action and realization of a random vector x G R q . Let 
E s {-\-) denote the agent's subjective conditional expectation. Given the realization 
of (zj,£j), the agent chooses the action d that maximizes the expected utility: 

z'fi x + E s (y\x, d = j)'/3 2 + e s ,j G {0, 1}. 

Thus the decision rule has the form 

d = 1 {*% + [E s (y\x, d = 1) - E s (y\x, d = 0)]'/3 2 > e} , (2.1) 



where z = z\ — zo, e = e^ — e\, and 1{-} is an indicator function whose value is one 
if the argument is true and zero otherwise. 

As in Ahn and Manski (1993), suppose that expectations are fulfilled: 

E s (y\x,d = j) = E(y\x,d = j). 

We assume that the researcher does not observe realization of e and E(y\x,d = j), 
but that of (z, x, d, y). 

Let G{x) = E{y\x,d = 1) - E{y\x,d = 0) and let w = (z,G(x)) E W C R k+P , 
where W denotes the support of the distribution of w. Then, equation (12.11) can be 
written as 

d=l{w'(3>e}, (2.2) 

where /3 = (/3 1 ,/3 2 ) is a vector of unknown preference parameters. The set of as- 
sumptions leading to the binary choice model in (j2.2p is equivalent to that of Ahn 
and Manski (1993, equations (l)-(3)). 

However, in this paper we consider an important deviation from Ahn and Manski 
(1993)'s setup where the unobserved preference shock e is independent of (z, x) with 
a known distribution function. Instead, we consider inference under a flexible spec- 
ification of the unobserved model component. Following Manski (1985), we impose 
the restriction: 

Med{e\z,x) = 0. (2.3) 

The conditional median independence assumption in (j2.2p allows for heteroskedastic- 
ity of unknown form, and hence, is substantially weaker than the assumption imposed 
in Ahn and Manski (1993). Given (J2.3J) . the model ( 12. II) then satisfies 

Med(d\z,x) = l{w'/3>0}. (2.4) 

Let denote the space of preference parameters, and let Aj, j G {l,...,p}, 
denote the function space of difference of conditional expectations E(yj\x,d = 1) — 
E(yj\x,d = 0). Moreover, let b = (61,62) and 7j(x), j e {l,...,p}, denote generic 



elements of and Aj, respectively. Let 7(2) = (71(2), ...,j p (x)) and A = nf=i^j 
be the space of 7. We refer to (3 = ( / 5 1 , / 5 2 ) an d G(x) as the true finite-dimensional 
and infinite-dimensional parameters. 

Suppose that data consist of random samples (zi,Xi,di,yi),i = 1, • • • ,N. We 
estimate in the first stage the conditional expectations which are not observed. Let 
G(xi) denote an estimate of the difference in conditional expectations. Using the 
estimate G, we estimate the preference parameters /3 in the second stage by the 
method of maximum score estimation of Manski (1975,1985). For any b and 7, 
define the sample score function 

1 - 
S N (b,j) = -J2 T ^ 2d ^ - 1)10#>i +7(*i)'&2 > 0}, (2.5) 

i=i 

where t% = r(xi) is a predetermined weight function to avoid unduly influences from 
estimated G(xi) at the boundaries of the support of Xi. The two-stage estimator of 
(3 is now defined as 

(3 = argmaxftge Sjv(&, G). (2.6) 

3 Consistency and Rate of Convergence of /3 

Let F(t;b) and f(t;b), respectively, denote the distribution and density of w'b. To 
simplify the analysis, we consider fixed trimming such that t(x) = l(x G X), where 
X C lZ q is a predetermined, compact, and convex subset of the support of x. For any 
real vector b, let \\b\\ F denote the Euclidean norm of b. For any p-dimensional vector of 



functions h(x), let 



|(lMLup>-. II^ILup) E where ll^illsup = sup{|/ij(x)| : 
x G X} and hj(x) denote the jth component of h. Let z be the subvector of z 
excluding the component z\. Write 61 = (61,1,61) and (3± = ((3 11 ,(3 1 ). We assume 
the following regularity conditions. 

Assumption 1. Assume that: 



CI. 9 = { — 1, 1} x T, where T is a compact subspace of R k+P 1 and f/3 1 ,/3 2 I is an 
interior point of T. 

C2. (a) The support of the distribution of w is not contained in any proper linear 
subspace of R k+P . (b) < P(d = l|u>) < 1 for almost every w. (c) For almost 
every (z,x), the distribution of z\ conditional on (z,x) has everywhere positive 
density with respect to Lebesgue measure. 

C3. Med(e\z, x) = for almost every (z,x). 

C4- There is a positive constant L < oo such that \F(ti, b) — Ffa] b)\ < L\ti — t 2 | 
for all (t 1 ,t 2 ) G R 2 uniformly over b G G. 



C5. 



G-G 



o P (l). 



Because the scale of (3 for the model characterized by (j2.4p cannot be identified, 
Assumption CI imposes scale normalization by requiring that the absolute value of 
the first coefficient is unity Assumption C2 implies that F(t;b) is absolutely con- 
tinuous and has density f(t;b) for each b G { — 1,1} x T. Assumptions CI - C3 
are standard in the maximum score estimation literature (see e.g., Manski (1985), 
Horowitz (1992), and Florios and Skouras (2008)). Assumption C4 is a mild condi- 
tion on the distribution of the index variable w'b. Assumption C5 requires uniform 
consistency of first stage estimation. This assumption can be easily verified for stan- 
dard nonparametric estimators such as series estimators (Newey (1997, Theorem 
1)) and the kernel regression estimator (Bierens (1983, Theorem 1), Bierens (1987, 
Theorem 2.3.1) and Andrews (1995, Theorem 1)). 

Given these regularity conditions, we have the following result. 

Theorem 1 (Consistency). Let Assumption^ (CI - C5) hold. Then the two-stage 
estimator given by $2.6}) converges to (3 in probability as N — > oo. 

In addition to consistency, we also study rate of convergence of the estimator (3. 
Let w = (z, G(x)), b = (pi, b 2 ) and (3 = (/3 l5 f3 2 ). Let F e (-\z,x) denote the distribution 
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function of e conditional on (z,x) and gi(zi\z,x) denote the density function of z\ 
conditional on (z,x). Let p\ (-,z, x) denote the partial derivative of P(d = l\z,x) 
with respect to z\. Define the following matrix 

V = f3 ll E Tp 1 (-w']3/f3 1A ,z,x)g 1 (-w / ]]/f3 ltl \z,x)ww' 

Since the objective function of (12.51) is non-smooth, we require the nonparametric 
parameter of the estimation problem should possess certain degree of smoothness 
to facilitate derivation of the rate of convergence result. In particular, we consider 
the following well known class of smooth functions (see, e.g., van der Vaart and 
Wellner (1996, Section 2.7.1)) : For < a < oo, let C% denote the class of functions 
f:X i — > TZ with ||/|| < M where for any q dimensional vector of non-negative 
integers k = (ki, ..., k q ), 

|, nfcf || \D k f(x)-D k f(x')\ 

max \\D /I + max sup 



a(k)<a" " SU P <r{k)<a x ^ x , \\x - %' 

where a{k) = Y^j=i ^qi ®L denotes the greatest integer smaller than a, and D k is the 
differential operator 



D k 



Qd{k) 



dxi ■ ■ ■ dx q q 

Given the norm ||-|| a , for any p-dimensional vector of functions h(x), let ||/i|| ap = 
|| (||/ii|| Q ,..., ||^p|| Q ) \\ e where hj(x) denote the jth component of h. Note that \\-\\ ap 
is a stronger norm than IHI^ used in condition C5 for the uniform consistency of the 
first stage estimator. 

The regularity conditions imposed for the convergence rate result are stated as 
follows. 

Assumption 2. Assume that: 
C6. The support ofz is bounded. 
CI. There is a positive constant B < oo such that (i) for every z\ and for almost 



every (z, x) , 

gi(zi\z,x) < -B, \dgi(zi\z,x)/dzi\ < B, and \d 2 gi{zi\z, x)/dz\\ < B, 

and (ii) for non-negative integers i and j satisfying i + j < 2, 

\d i+j F £ (t\z,x)/dt i dz{\ <B 

for every t and z\ and for almost every (z, x) . 
C8. All elements of the vector w have finite third absolute moments. 
C9. The matrix V is positive definite. 
CIO. For each j G {1, ...,p}, Aj = C^ for some a > 2q and M < oo. 



Cll. 



G-G 



Op(s]\f) where e^ is a non- stochastic positive real sequence such 



<i.p 



that N^ 3 £ N < 1 for each N. 



Assumption C6 is standard in deriving asymptotic properties of Manski's maxi- 
mum score estimator (see, e.g. Kim and Pollard (1990), pp.213 - 216). Assumption 
C7 requires some smoothness of the density g\(z\\z, x) and the distribution F e (t\z, x). 
Assumption C8 is mild. Since — V corresponds to the second order derivative of 
E[Sx(b, 7)] with respect to b evaluated at true parameter values, Assumption C9 is 
analogous to the classic condition of Hessian matrix being non-singular in the M- 
estimation framework. Assumption CIO imposes smoothness for the nonparametric 
parameter 7 and hence helps to control complexity of the space A. 

Assumption Cll requires that the first stage estimator should converge under 
the norm ||-|| ap at a rate no slower than N^ 1 ^ 3 . Note that convergence of G to G 
in the norm ||-|| Q „ also implies uniform convergence of derivatives of G to those of 
G. For integer- valued a > 0, Assumption Cll is fulfilled provided that for vector of 
non-negative integers k = (hi, ..., k q ) that satisfies a{k) < a, 



D k G t j - D k G t)3 



= O p (e N ) (3.1) 

sup 



where G t j(x) denotes the estimate of G t ,j(x) = E(yj\x,d = t) for (t,j) G {0,1} x 
{1, ...,£>}. The condition ( 13. ip can also be verified for series estimators (Newey (1997, 
Theorem 1)) and the kernel regression estimator (Andrews (1995, Theorem 1)). 

Theorem 2 (Rate of Convergence). In addition to Assumption^ (CI - C5), let 
Assumption^ (C6 - Cll) also hold. Then - j3 = O^N' 1 ^). 

E 

Note that if G were priorly known to the researcher, the preference parameters 
(5 could be estimated using covariates w and the resulting maximum score estimator 
would have the cube root rate of convergence (Kim and Pollard (1990)). In the case 
of unknown G, Theorem [2] implies that the two-stage estimator /3 retains the same 
convergence rate as its corresponding infeasible estimator. 

We conclude this section by making some remarks on the asymptotic distribution 
of the two-stage estimator /?. Without the first stage estimation, Kim and Pollard 
(1990) obtained the limiting distribution of the maximum score estimator. In view 
of this, we conjecture the limiting distribution of our proposed estimator of (3 might 
be the same as that of Kim and Pollard (1990), as long as the first stage estimator 
converges uniformly in probability at a sufficiently faster rate than N~ x ' 3 with other 
regularity conditions. Once we show this, the inference on /3 can be carried out by 
subsampling (Delgado et al. (2001)) since the standard bootstrap cannot be used to 
estimate the distribution of the maximum score estimator consistently (Abrevaya 
and Huang (2005)). There does not seem to exist a known result on nonstandard M- 
estimation with nonparametrically generated regressors. It is thus a future research 
topic to establish the limiting distribution of our estimator and more generally to 
develop a general approach for nonstandard M-estimation with nonparametrically 
generated nuisance parameters. 

4 Monte Carlo Simulations 

We adopt the following DGP in simulation study of the two-stage maximum score 
estimator: 

d = l{/3 + zP x + G(x)j3 2 > e}, 
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where x = (xi,^), G(x) = E(y\x,d = 1) — E(y\x,d — 0), z ~ Logistic, X\ ~ 
[/(— 1, 1), x 2 ~ Beta(2,2) and e|(z, x) ~ N(0, 1 + z 2 + x\ + X2). The scalar random 
variable ?/ is generated according to 

V = rf (7oi + Tll^l + 721^2 + Ml) + (1 - Gf)(700 + 710^1 + 720^2 + Mo), (4.1) 

where (ui,uo) are independent of (x, z, e) and are jointly normally distributed with 
E{u\) = E{uq) = 0, Var{ui) = Var(uo) = 1, and Cov{ui,uq) = p. Given ( 14.11) . 

G{x) = 7oi - 7oo + ( 7ll - 7l0 )xi + (7 21 - 720)^2- 

The true parameter values are specified in Table 1. 

We compare infeasible single-stage estimator using (z,G(x)) as regressors and 
also the feasible two-stage estimator using (z, G(x)) as regressors. We consider both 
parametric and nonparametric first stage estimators. For the former, we estimate 
E(y\x,d = j) by running OLS of y on x using d = j subsamples. For the latter, 
we implement Nadaraya- Watson kernel regression estimators. The nonparametric 
estimators of E(y\x, d = j), j G {0, 1} are thus constructed as 

£ y l K(Q- 1/2 hJ } 1 (x - Xi ))l {d t = j} 

^ (4-2) 

52K(flJ 1/2 hJ f 1 (x-x i ))l{d i =j} 

i=l 

where Xi = (xi^a^i), fij is the diagonal matrix whose kth diagonal element is the 
estimated variance of Xk,i conditional on di = j, and h^ is a deterministic bandwidth 
sequence. Here, K(.) is a multivariate kernel function of the 12th order (see, e.g., 
Bierens (1987, p. 112) and Andrews (1995, p. 567)) such that 

6 

K{x) ee Y. a mK 2 ^P [-x'x/(2b 2 J] , 

m=l 
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where the constants (a m , b m ) , m G {1, ..., 6} satisfy 

6 6 

^2 a m = 1 and V] a m &^ = for Z G {1, ..., 5}. (4.3) 



m=l m=l 



We specify b m = m 1//2 and then solve a m as solution of the system of linear equa- 
tions (14~3|) . The bandwidth /ijy is set to be ciV~ 1/36 with c G {3,3.5,4,4.5,5,5.5,6}. 
As noted by Bierens (1987, p. 113), the choice of the constants (a m , b m ) for the kernel 
function is less crucial since its effect on the asymptotic variance of the conditional 
mean estimator can be captured via the scale constant c associated with the band- 
width /ijv- By Theorem 1(b) of Andrews (1995), the resulting first stage estimator 
(14.21) has the convergence property required in (13. ip with a(k) < 4 and e^ = A r ~ 1 ' 3 , 
thus fulfilling regularity conditions C5 and Cll of Section [3j 

To implement the second-stage estimator using nonparametric first stage estima- 
tors, we trim the data by setting Tj = l{|xii| < 1 — £ , £ < x 2i < 1 — ^} where r» is the 
weight introduced in ( 12.5P and e is set to be 0.01. The estimates of (3 , (3 X and f3 2 are 
obtained using grid search method. Since the model (12.21) allows for identification of 
preference parameters only up to scale normalization, we report simulation results 
of the estimated ratio A = P 2 I P\- 

Let X single, Xols an d Xxemei respectively denote the estimators A that are con- 
structed based on the infeasible single-stage, two-stage (OLS first stage) and two- 
stage (kernel regression first stage) preference parameter estimators. We compute 
bias, median, root mean squared error (RMSE), mean absolute deviation (mean AD) 
and median absolute deviation (median AD) of these estimators based on 1000 sim- 
ulation repetitions. Table 2 presents simulation results of Xsingie and Xols and Table 
3 gives those of Xxemei for various values of the bandwidth parameter c. We find that 
there seems to be systematically downward bias among all the simulation configura- 
tions including the infeasible single-stage estimation cases. However, magnitude of 
the bias diminishes as sample size increases. The precision in terms of RMSE, mean 
AD and median AD of the two-stage estimators Xols and Xxemei is quite close to 
that of the infeasible single-stage estimator Xsingie- We notice that simulation results 
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of Xxemei do not appear to be very sensitive with respect to choice of the bandwidth 
parameter c though setting c to be around 4.5 tends to yield better overall finite 
sample performance. In short, our proposed two-stage maximum score estimator 
seems to work well in the simulations. 

5 Conclusions 

This paper has developed maximum score estimation of preference parameters in 
the binary choice model under uncertainty in which the decision rule is affected by 
conditional expectations. The estimation procedure is implemented in two stages: 
we estimate conditional expectations nonparametrically in the first stage and obtain 
the maximum score estimate of the preference parameters in the second stage using 
the choice data and first stage estimates. The paper has established consistency 
and the rate of convergence of the corresponding two-stage estimator, which is of 
independent interest for non-smooth M-estimation with generated regressors. 

It would be an alternative approach to develop the second stage estimator using 
Horowitz (1992) 's smoothed maximum score estimator or using a Laplace estimator 
proposed in Jun, Pinkse, and Wan (2013). These alternative methods would produce 
faster convergence rates but require extra tuning parameters. Alternatively, we might 
build the second stage estimator based on Lewbel (2000), who introduced the idea 
of a special regressor satisfying certain conditional independence restriction. These 
are interesting future research topics. 

A Proof of Consistency 

Recall that w = (z, G{x)) and S]^(b, 7) is the sample score function defined by (12.51) . 
We first state and prove a preliminary lemma that will be invoked in proving Theorem 
[1] of the paper. 
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Lemma 1. Under Assumptions CI, C4 and C5, 



sup 

bee 



S N (b,G)-S N (b,G) -^0. (A.l) 

Proof of Lemma U\ Note that 

1 N 
S N (b,G)-S N (b,G)\ < -J2r t l {\(G(x t ) - G(x t ))%\ > \w[b\). (A.2; 



4 = 1 



By Assumption CI, H&2II.E < B 2 for some finite positive constant B 2 . Therefore, the 
right-hand side of the inequality ( IA.2J) is bounded above by 



V N = P N t = 1,S. 



,-"2 



G-G 



> |u/6| 



(A.3) 



where Pjy denotes the empirical probability. Note that the term ( IA.3J) is further 
bounded above by 



Pn(B 



N = ^N -D2 



G-G 



> \w'b\). 



(A.4) 



Let E„ denote the event 



G-G 



< 7] for some 77 > 0. Then given e > 0, 



P(sup feGe T N >e)< P(sup bee T N > e, E v ) + P(E°) 
< P [sup 6e0 P N (B 2V > \w'b\) >e] + P(E C J. 



By Assumption C5, P(E!1) — > as N — > 00. Hence, to show (1A.1|) . it remains to 



establish that as N — > 00, 



P [sup bee Pn (B 2V > \w'b\) > e] — ► 0. 



(A.5) 



Note that by Assumption C4, P [B 2 r\ > \w'b\) < 2LB 2 rj. Therefore, we have that 

P [sup 6£0 P N (B 2V > \w'b\)>e] 
< P[sn Vbee \P N (B 2 r l >\w'b\)-P(B 2 r ] >\w'b\)\>e-2LB 2 r ] ], (A.6) 
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where rj is taken to be sufficiently small such that e — 2LB27] > for the given e. By 
Lemma 9.6, 9.7 (ii) and 9.12 (i) of Kosorok (2008), the family of sets {B 2 r] > \w'b\} for 
6g 6 forms a Vapnik-Cervonenkis class. Therefore, by Glivenko-Cantelli Theorem 
(see, e.g. Theorem 2.4.3 of van der Vaart and Wellner (1996)), the right-hand side of 
(1A.6I) tends to zero as iV — > 00. Hence, the convergence result in (I A. 51) holds and 
Lemma [1] thus follows. □ 

We now prove Theorem [1] for consistency of /3. 



Proof of TheoremUi For any (6,7), define 

S(b, 7) = E [r(2d - 1)1{ z'h + 7 (x)'& 2 > 0}] . 

Given Assumptions CI - C3 and by Manski (1985, Lemma 3, p. 321), (3 uniquely 
satisfies (3 = argmax^ge S(b, G). We now look at the difference 



S N (b,G)-S(b,G) < S N (b,G)-S N (b,G) +\S N (b,G)-S(b,G)\, (A.7) 



where by Lemma dj the first term of the right-hand side of ( 1A.7I) converges to zero 
in probability uniformly over 6g9, whilst by Manski (1985, Lemma 4, p. 321), the 
second term converges to zero almost surely uniformly over b G 0. Therefore, we 
have that 



sup 
bee 



S N {b,G)-S(b,G) 



0. 



By Lemma 5 of Manski (1985, p. 322), S(b, G) is continuous in b. Given these results, 
Theorem 1 thus follows by application of the consistency theorem in Newey and 
McFadden (1994, Theorem 2.1). □ 
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B Lemma on the Rates of Convergence of a Two- 
Stage M-Estimator with a Non-smooth Crite- 
rion Function 

We first present and prove a general lemma establishing the rates of convergence of 
a general two-stage M-estimator under high level assumptions. In next section, we 
prove Theorem [2] by verifying these assumptions for the particular estimator given 
by (I2.6P under the regularity conditions of CI - Cll. 

To present a general result, let s H- rriQ^s) be measurable functions indexed by 
parameters (0,h). Let and H be the space of parameters 9 and h, respectively. 
Let (6*,h*) denote the true parameter value. We assume (6*,h*) G O x H. Let 
Sn(6,K) = Yli=i m 6,h( s i)/N be the empirical criterion of the M-estimation prob- 
lem where (sj) i=1 are i.i.d. random vectors. Suppressing the individual index, let 
S (9, h) = E [moft{s)] be the population criterion. For a given first stage estimate h, 
let the estimator 9 be constructed as 

?=argsupSjv (0,h) . (B.l) 

Let cLq(9,9*) and d H (h,h*) be non-negative functions measuring discrepancies 
between 9 and 9*, and h and h*, respectively. Note that de and du are usually 
related to but not necessarily the same as the metrics specified for the spaces and 
H. Given a non-stochastic positive real sequence e^, define H N {C) = {h G H : 
dn{h,h*) < Cen}- To simplify the presentation, we use the notation < to denote 
being bounded above up to a universal constant. Define the recentered criterion 

S N (9, h) = (S N (9, h) - S N (9*, h)) - (S(9, h) - S(9\ h)). (B.2) 

The following lemma modifies the rate of convergence results developed by van der 
Vaart (1998, Theorem 5.55) and provides sufficient conditions ensuring that 9 retains 
the same convergence rate as it would have if h* were known. 
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Lemma 2 (Rate of convergence for a general two-stage M-estimator) . For any fixed 
and sufficiently large C > 0, assume that for all sufficiently large N, 



sup \S(9*,h)-S(9*,h*)\<(Ce N y 

h£H N {C) 



(B.3) 



and there is a sequence of non-stochastic functions e^ : x H^iC) i — > R such 
that for all sufficiently small 5 > and for every (9,h) G 6 x Hn(C) satisfying 
de(0,e*)<5, 



S(0, h) - S{9\ h*) + e N (9, h) < -d 2 e (9, 9*) + d 2 H (h, h*) 

sup \e N (9,h)\ < CSe N , 

d @ (e,e*)<5,(e,h)£BxH N (c) 



and 



E 



sup 
d e (e,9*)<6,(e,h)eexH N (c) 



S N (9,h) 



< 



J N 



(S) 



N 



(B.4) 
(B.5) 



(B.6) 



where <ft N {o~) is a sequence of functions defined on (0, oo) and satisfies that (j) N (5)5~ a 
is decreasing for some a < 2. Suppose dn{h,h*) = O p (en), de(9,9*) = o p (l) and 
there is a non- stochastic positive real sequence 5^ which tends to zero as N — > oo 
and satisfies that en < #tv and 4> n (5n) < y/N6% for every N. Then de(9,9*) = 
O p (5 N ). 

Proof. Based on the peeling technique of van der Vaart (1998, Theorem 5.55), for 
each natural number N, integer j and positive real M, construct the set 

A N , jM (C) = {(9,h) e ®*H N (C) : 2 j - x 6 N < d e (9,9*) < V6 N , d H (h,h*) < 2~ M d e (9 , 9*)} . 
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Then we have that for any e > 0, 

p(d e {6,e*)>2 M [ 



< P(2de(e,9*)>e) + P((9,h)e{j. 

< P(2d e (8,8*)>e) + 



>M,2i8 N <e 



An,j,m(C) 



£, 



>M,2iS N <e 



P 



sup [S N (9,h)-S N (9*,h)]>0 

,h)€A NJyM (C) 



(B.7) 



where the last inequality follows from the definition of 9 given by (IB. II) . Since 
de(9,9*) = o p (l), the term P(2de(9,9*) > e) tends to zero as N — > oo. Hence the 



remaining part of the proof is to bound the terms in the sum ( IB. 71) . 

Let N be large enough such that ( IB. 31) holds and choose e to be small enough 
such that assumptions ( IB. 4ft . (1B.5[) and ( 1B.6[) hold for every 5 < e. Note that for 
every sufficiently large M, if (0, h) E A NJM (C), then d? H (h, h*)-d 2 @ (9, 9*) < -5 2 N 2 2 i 
so that by dB~4l) . 

S(9, h) - S(9*, h*) + e N (9, h) < -S%2^ (B.8) 



and thus 



S N (9, h) - S N {9\ h) < S N (9, h) + S(9*, h*) - S(9*, h) - e N (9, h) 



6%2*. 



Therefore, by Markov inequality, each term in the sum ( 1B.7I) can be bounded above 
by 



6J?2-VE 



sup 

(6,h)eA Ntj>M (C) 



S N (6, h) + S{9\ h*) - S(9*, h) - e N {9, h) 



(B.9) 



By ( IB. 31) . (IB. 51) . (IB .6 1) and applying triangular inequality, the term ( 1B.9I) is bounded 
above by 

5~ N 2 2^ [N^ 2 4> N (2^ N ) + VC6 N £ N + (Ce N ) 2 ] . (B.10) 

By the monotonicity property of the mapping S h- > (f> N (S)5~ a , we have that 4> n {2^5n) < 



2- ?Q JV (( s >Ar). Furthermore, since 4> n (5n) < vN5 N , the first term in the bracket of 
(IB.lOj) can thus be bounded by 2^ a 5 2 N . Given that En < 5^, the term (IB. 101) can be 
further bounded above by 2^ a ~ 2 ^ + C2~i + C 2 2~ 2 K Using this fact and the condition 
a < 2, it follows that the sum ( IB .71) tends to zero as M — > oo. 

Since dn(h,h*) = O p (en), P(h G H N {C)) can be made arbitrarily close to 1 
by choosing a sufficiently large value of C for every sufficiently large N. Therefore, 
Lemma[2]follows by putting together all these results and noting that SN+dnih, h*) = 
O p (5 N ). D 

C Proof of the Rate of convergence for (3 

To establish the convergence rate of /3, we apply Lemma [2] by setting (6, h) = (6,7), 
(0*, h*) = ((3, G), 6 = {-1, 1} x T, H = A, s = (r, d, z, x) and 

m b)1 (s) = r(2d - l)l{z'bi + i{x)'b 2 > 0}. 



Assumptions (IB. 31) . (IB.4J) . (IB.5J) and (1B.6J) of Lemma [2] are non-trivial and will be 



verified using primitive condition CI - Cll of the model. Assumption ( IB. 4ft is con 



cerned with the quadratic expansion of S(b, 7) around (/?, G) by which we obtain the 
functional form of ejv(&, 7)- Recall that w = (z,G(x)), z = (zi,z), w = (z,G(x)), 
bi = (bi t i,bi), (3 t = (/3 11 ,/3 1 ), b = (61,62) an d /3 = (/3 l7 /3 2 ). The following lemma 
will be used to establish expansion of the population criterion 5(6,7). 

Lemma 3. Under conditions C3 and CI, the sign ofpi(—w'f3/(3 l i,z,x) is the same 
as that of Pi x for almost every (z, x). 

Proof. Note that the model (12.21) implies that 

P{d= l\z,x) =F e (w'/3\z,x). 
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Thus, by C7(ii), P(d = l\z,x) is differentiable with respect to z\ and 

d 



£-P(d=l\z,x) = P ltl ^F e (t\z,x) 



t=w'P 



dz\ 



F £ (t\z,x) 



t=w'/3 



_. ByC3,M-^Wi,i) 



Consider the mapping z 1 H- h(z\) = ^-F £ (t|z,x) 

1 ' t=zip hl +w'P 

for almost every (z, x). Therefore, Lemma [3] follows from this fact and the mono- 
tonicity of F E (t\z,x) in the argument t. D 

By assumption CI, the space of the coefficient 61,1 is {—1, 1} and thus 6i : i = /3 1 x 
when ||6 — (3\\ E < S for 5 small enough. Let p(z, x) = P(d = l\z, x) and 



51(6,7) = E t(2 P (z,x) - l)l{z 1 /3 ltl + n i + 1 (x)'b 2 > 0} 



(CI) 



We now derive the quadratic expansion of 51(6,7) around (j3,G). 



Lemma 4. For sufficiently small 
CI, C8 and C9, we have that 



b- /3 



and || 7 — (jIIoo and under conditions C3, 



5i(/3 )7 )-5i(/3,G) 



< 



I7-GI 



and £nere are constants c\ > ana 1 C2 > sttca taat 



5i(6,7)-5i(/3,G) + e(6, 7 )<-Ci 



6-/3 



C2 1|7 - G\ 



for some function e(6, 7) that satisfies 



e(6,7) 



< 



b-(3 



\j-G\ 



Proof. We prove Lemma H] explicitly for the case (3 1 1 = 1. Proof for the case /3 X x 
— 1 can be done by similar arguments. 
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Suppose now (3 1A = 1. Then 



S l (b, 1 )-S 1 (p,G) 

E (r(2p(z, x) - 1) \l{zi + ?\ + G(x)'/3 2 < 0} - l{ Zl + Tb x + 7 (x)'fc 2 < 0} 



Let 

X(t) = z'(p 1 +t(b 1 -/3 1 ^ + (G(x)+t( 1 (x)-G(x))y(P 2 + t(b 2 -P 2 )) 
*(t) = -E(r(2p( Zl x) - 1)1{Z! + A(£) < 0}). 

The first-order and second-order derivatives of \&(t) are derived as follows: 

tf'(i) = J E(rA'(t)(2p(-A(t),5',x)-l)( 7l (-A(t)|?,x)), 
*"(*) = - J E{r(A'(t)) 2 [2 Pl (-A(t),i,x)( 7l (-A(t)|z,x) 

+ (2p(-A(t), i, x) - 1) g^9i(-X (t) \z, x) 

+E(2r[(2p(-X(t) 1 z,x) - l)}g 1 (-X(t)\I,x)( 1 (x) - G(x))'(b 2 - (3 2 )) 

Then the second order expansion of Si(b, 7) — S\ ( (3, G ) takes the form 



tf'(0) + tf"(0)/2 + I (max 



b-(3 



A\l-G\ 



where by C7 and C8, the remainder term has the stated order uniformly over b and 
7. Given assumption C3, it follows that p(—w'/3,z,x) = 1/2 for almost every (z,x). 
Let 

k(z,x) = 2pi (—w(3,z,x)gi(—w'/3\z,x). 

Then we have that 



tf '(0) + tf "(0)/2 -E I r^?, x) (#'(& - /3) + (7(2) - G(x))'/3 2 

Ai + A 2 + e(6,7) 
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where 



A x = (b- p)'E(rK(z,x)ww , )(b-P), 

A 2 = E(tk(z,x)( 1 (x)-G(x))'(3 2 (3 2 ( 1 (x)-G(x))), 
e(b,i) = 2(%-P)'E(tk(z,x)w/3' 2 ( 1 (x)-G(x))). 



Under condition C9, E(tk(z, x)ww') is positive definite, so that Ai > c\ 



b-(3 



for some positive real constant c\. By Lemma [31 px (— w'/3, z,x) > and thus 



k(z,x) > 0. By Cauchy-Schwarz inequality, < A 2 < c 2 \\j — GW^, where c 2 = 
E(tk(z, x)) \\(3 2 \\ e > 0, and the function e(b, 7) satisfies that 



e(6,7) 



< IE tk(z, x 



{b-P)'w(5' 2 { 1 {x)-G{x)) 



< 2E(tk(z,x)\\w\ 



E) \\f J 2\\E 



b-/3 



\l-G\l 



and || 7 — G\\ are sufficiently 



Hence Lemma H] follows by noting that when b — (3 
small, 

SS, 7) - Sx0, G) =\A 2 + o (|| 7 - G\\l) J < c 2 || 7 - G\\ 



2 
00 



and 



S 1 (6, 7 )-5 1 (/3,G)+e(6, 7 ) < -A 1 + A 2 



< -ex 



b-/3 



+ c 2 H7 - G\ 



D 



We now verify assumption ( IB. 61) of Lemma El Note that for 5 sufficiently small, 
assumption CI implies that b 1: x — fix 1 when \\b — /3\\ E < 5. Therefore we can focus on 



analyzing (1B.6|) for the case of 61^ = j3 lt and 



b-fi 



< 5. For any s = (r, d, z, x), 
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consider the following recentered function 



mr(s) = r(2d-l) l{2i/3 u + ^ + T (z)'&2 > 0} - lfoft , + z'(3 l + 7 (x)'/3 2 > 0} 



Vr 



and the class of functions 



(C.2) 



Fs,e ■■■ { m~ hi : 



6-/3 



<^ll7-G|L iP <e} 



(C.3) 



Let 



IMP) 



denote the L r (P) norm such that 



p(p) = [E(|/(T,d,z,a;)r)] 



r\il/r 



for 



any measurable function /. For any e > 0, let Nn(e, F , L r (P)) denote the L r (P) 
- bracketing number for a given function space F. Namely, N^(e, F,L r (P)) is the 
minimum number of L r (P) - brackets of length e required to cover F (see e.g., van der 
Vaart (1998, p. 270)). The logarithm of bracketing number for F is referred to as the 
bracketing entropy for F ■ Assumption (1B.6J) is a stochastic equicontinuity condition 
concerning the complexity of the function space Fs, e i n terms of its envelope function 
and bracketing entropy. Let M s e denote an envelope for Fs,e such that m^ (s) 
< |M <5e (s)| for all s and for all m^ e Fs,e- The next lemma derives the envelope 
function M, . 

Lemma 5. Let S and e be sufficiently small. Then under conditions CI, C4 ,C6 and 
CIO, for some real constants Oi > and 02 > 0, we can take 



M Se = l{aimax{<5,£:} > \w'/3\} 



and furthermore, 
Proof. Note that 



\M s J\ L2{p) < a 2y /max{S,e}. 



(C.4) 



mrJT,d,z,x) 



< 1{Z!P 1A + z'h + j(x)% > > z'0! + 7 (x)'/3 2 
z'/3i + i{x)'p 2 > > Zl p x ! + z'b, + 7(x) / 6 2 }. 



or 
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Under condition C6, there is a positive real constant B such that \\z\\ E < B with 



probability 1. Hence if 



b-f3 



E 



< 5 and II7 — Gil „ < e, then we have that 

— II ' lla,p — ' 



Zl f3 lyl + Tb x + 7 (x)'6 2 > > z'f3 1 + 7 (x)'/3 2 

«=* 7{b! - A) + 7(x)'(fe 2 -p 2 )>- WPi + -y(x)%] > 

=> <^ OFL + INIJ > - [*% + 7 W 2 ] and > u//3 + (j(x) - G(x))'/3 2 
=► w'f3 + ( 7 (x) - G(x))'/3 2 > -5 [\\z\\ E + e + \\G\\J and e \\(3 2 \\ E > w'(3 
=► 5 [B + £ + ||G||J + e ||/y E > u//3 > -5 [B + e + ||G||J - e \\0 2 \\ E 

Based on similar arguments, it also follows that 

Jfix + l(x)'f3 2 > > Zl (3 ltl + S^ + 7(z)'&2 

=> 5[ J B + £+||G||J+e||/3 2 || E > W / /3>- ( 5[ J B + 5+||G||J-£||/3 2 || £; 

Therefore, Lemma [5] follows by noting that for e sufficiently small (e.g., e < 1), we 
can take 

M Se = l{aimax{5,e} >\w'f3\} 

where ai = 2max{(5 + 1 + HG^) , ||/3 2 || s }. By CI and CIO, < a x < 00 and 
hence by C4, ||M 5 J| < a 2A /max{5, e} with a 2 = \f2a\L where L is the positive 

constant stated in condition C4. D 

The following lemma establishes the bound for the bracketing entropy for Fs, e - 

Lemma 6. Given conditions CI, C4, C6, CI, C8 and CIO, we have that 



log N^e,F 5 ,e,L 2 (P)) < Vmax{5,e}/e 

for sufficiently small 5 and e and for e < a 2 ^max{5,e} where a 2 is the constant 



stated in (C4\)- 



21 



Proof. For j £ {1, ...,p}, let Aj(e) and AjTZj(5, e) be classes of functions defined as 

^•( £ ) = {(7 J -^)/^:||7 J -G,|| a <4, 
A>,(5,e) = {( 7i (x) - C,(x))(6 2)j - /3 2J )/ (£*) : || 7j - G 3 \\ a < e, \b 2>j - 2>j \ < 6} 

Assumption CIO implies that both Aj(e) and Aj~Bj(5,e) are Cf with a > 2q. By 
Corollary 2.7.2 of van der Vaart and Wellner (1996, p. 157), we have that for j £ 
{1, •••,?}, 

logiV D (e 2 , A^L^P)) < e" 2 ^ and logiV D (e 2 , A,-B i ((y,e),Li(P)) < e" 2 ^. (C.5) 



Note that for s = (r, d, z, x), m-r As) defined by (IC.2J) can be rewritten as 



6,7 



m fe, 7 ( S ) = Td 



l{h(s; b) > 0} - l{/i(s; /3) > 0} +r(l-d) l{/i(s; 6) < 0} - l{/i(s; /3) < 0} 



where 



h(s;b) = w'/3 + w'(b-(3) + ( 1 (x)-G(x)y(b 2 -^) + ( 1 (x)-G(x))'(3 2j 
h(s;P) = w'/3+(j(x)-G(x)y/3 2 . 



Consider the following spaces: 



6i 



{w'(b-/3): 



b-fl 



E 



<S}, 



©2,j = {(lj(x) - Gj(x))(b 2J - /3 2J ) : ||7.j - Gj\\ a < e, \b 2J - f3 2J \ C 8} for j £ {1, . .,p}, 

©2 = {(t(^) - C(a:))'(6 2 - /3 2 ) : || T - C||^ < ^, ||6 2 - /3 2 ||^ < <5^ 

©3,j = {(lj(x) - Gj(x))P 2tj : || 7 , - Gj\\ a < e} for j £ {1, ...,p}, 

6 3 = {h(x)-G{z))%:\\>y-G\\ a „<e}, 



e, 



{h(r, d, z, x; b) -«//3:|| T -Gf|| <e, 



6-/3 



<<$}. 



Let n*(e) = logiV D (e, 6 4 , L X (P)) for i £ {1, 2, 3, 4} and n k>j (e) = log JV D (e, 6 fcj , L X (P)) 
for (A;,j) £ {2,3} x {l,...,p}. Since 9i is a pointwise Lipschitz class of functions 
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with envelope ||w|| s 5. By condition C8, £ , (||iy|| B ) is finite. Thus applying Theorem 
2.7.11 of van der Vaart and Wellner (1996, p. 164), we have that 

m(6 2 ) < \og{5/e 2 ) < V~5/e < y/maxfaeVe. (C.6) 

Note that for any norm ||-||, any fixed real valued c, any class of functions F, it is 
straightforward to verify that 

iV D (e,cF, INI) = 1 for c = 

iV D (e,cr, INI) <iV D (e/|c|,r, INI) forc^O 

where cF = {cf : / G F}- 

Using this fact, we have that ri 2j -(e 2 ) = log N[](e 2 / (eS) , AjHj(5,e), L\(P)) and 
n 3]i (e 2 ) = for 2>j = and n 3j (e 2 ) < logiV Q (e 2 /(£ |/3 2J |), A^e), L X (P)) for 2j . ^ 0. 
Hence for sufficiently small 6 and e (e.g., <5 < 1 and e < 1) and by ( 1C.5|) . it follows 
that 



n 2 ,i(e 2 ) < logiV Q ((e/ymax{5, £» 2 , A.-B^, e), L X {P)) < (a 2 y/m a x{5,e}/e) 



2q/a 



Since a > 2q, we have that n 2j (e 2 ) < A/rnax{5, e}/e for e < a2A/niax{5,e}. 
Using similar arguments, we can also deduce that ^3j(e 2 ) < ^/max{£, e}/e for 
e < 02 y/max{6^e}. 

By preservation of bracketing metric entropy (see, e.g., Lemma 9.25 of Kosorok 
(2008, p. 169)), we have that for i e {2, 3}, 



t 4 (e) < n i , p (e2 1 " p ) + V" ' ri^eT 

* — '.7 = 1 



and 714(e) < rii(e/2) + 77,2 (e/4) + 71,3 (e/4). Therefore by the bounds derived above, 
it follows that n 2 (e 2 ) < y/max{5,e}/e, n 3 (e 2 ) < ^/max{5, e}/e and also n 4 (e 2 ) < 
^/max{5,£}/e. 

Now let f[ < f?, ..., fLe^esMiP)) - fNn(e*,& 3 M(P)) and ^ - 9i, -, ^ [](e 2,e 4 ,L 1 (p ) ) < 
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,v 



9n (e 2 e l (P)) ^ e ^ ne e 2 -brackets with bracket length defined by Li(P) for the spaces 
6 3 and 9 4 , respectively. For 1 < k < N {] {e 2 , 9 3 , L t (P)) and 1 < j < N {] {e 2 , 9 4 , L X {P)), 
define 



™f k (r,d,z,x) 



m%(r,d,z,x) 



rd [l{w'/3 + gf(z,x) > 0} - l{w'(3 + ff{z,x) > 0}] 

+r(l - d) [l{w'(3 + gf(z, x) < 0} - l{w'(3 + f£(z, x) < 0}] , 

rd [l{w'/3 + gf(z, x) > 0} - l{w'/3 + #(*, x) > 0}] 

+r(l - d) [l{w'(3 + <#(*, x) < 0} - l{w'/3 + #(*, x) < 0}] . 



Note that 



< mf k - m% < 2 [l{gf < -w'P < gf} + l{tf < -w'/3 < fi}] 



Thus 



E( 



vrEu - 



,L\2 



jk '"'jk 



) < 12P(^ L < -w'(3 < gf) + 4P(ff < -w'(3 < ff). (C.7) 



By condition CI and given (z,x), the mapping z\ i — > w'/3 is one-to-one. Hence by 
condition C7, the density of w'(3 conditional on (z,x) is bounded and by (1C.7I) . it 
then follows that \\mH h — m^.lL , n . < e. Moreover for each fhr„ e Fsfat, there is a 

II jk J /c IIL2(-P) °>7 "i c « j 

bracket [mf k , yn^.] in which it lies. Therefore, 

logJV D (e,ri, eN ,X2(P)) < n 3 (e 2 ) + n 4 (e 2 ) < ^/max{5,e}/e. 



D 

Replacing (9,h) and 9* with ((ySn,6),7) and ((3^,(3), respectively in the defini- 
tion of Sn given by ( 1B.2I) . we now verify assumption (1B.6[) in the next lemma. 

Lemma 7. For sufficiently small 5 and e, under conditions CI, C4, C6, CI, C8 and 
C10, 



E 



sup 

\ E <&,\\l-G\\ a ^<e 



Sjv(&,7) 



< 



A/max{5,e} 



N 
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Proof. By Lemmas [5] and El we have that 



I <5, £ || i2 (p) 







log N {] (e,Fs,e,L 2 {P))de 



i>a2\/raax{5,e} 



ra-2 ymax|o,e| 

< y v /logiV D (e,r^,L 2 (P))de 



< 



max{<5,e}. 



Lemma [7] hence follows by applying Corollary 19.35 of van der Vaart (1998, p. 288). 

□ 
We now prove Theorem [21 

Proof of Theorem\g We take 5 N = iV" 1 / 3 , d e (b, (3) = Jci \b - P\\ E and d H {j, G) = 
^/c2 ||t — G\\ ap i n the application of Lemma EJ where c\ and c 2 are real constants 
stated in Lemma HJ 

Since C\ > 0, the norm by the metric d&(-, •) is equivalent to the Euclidean norm 
and thus by Theorem [U d@((3,f3) = o p (l). Moreover since c 2 > 0, assumption Cll 
implies that dn(G,G) = O p (en)- Given assumption CI, for sufficiently small 5, we 
have that b± t i = /3 X x when de(6, 0) < 5. Hence for sufficiently small § and e^, by 
Lemma H] and noting that \\-\\ ap is stronger than IHI^, assumptions (IB. 31) . (IB .41) and 
(!B~5l) hold. 

By Lemma [7] and by taking C sufficiently large in the definition of H^(C) 



of Lemma EJ assumption ( 1B.6I) also holds with 4> N {5) = ^max{5, En}- Clearly, 



(f) N (S)S~ a is decreasing for some a < 2. By assumption Cll, e^ < 5n and thus 
4> n (8n) < VN5 2 N for every N. Therefore, all conditions stated in Lemma |2] are 
fulfilled and the result of Theorem [2] hence follows. □ 
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Table 1 : Parameter configuration in the DGP 



Parameter 
Value 



0o 




0i 
1 



02 
1 



Toi 
0.2 



7n 
0.15 



721 

0.1 



7oo 
0.1 



7io 
0.08 



720 

0.4 



-0.8 



Table 2 : Simulation Results for A 



Single 



and Xqls 



N 


Bias 


RMSE Median 


mean AD 


median AD 






Single-stage estimation 






300 


-0.058 


0.199 0.890 


0.184 




0.225 


500 


-0.048 


0.190 0.928 


0.172 




0.202 


1000 


-0.040 


0.184 0.942 


0.166 




0.187 




Two- 


■stage estimation : 


OLS first stage 




300 


-0.084 


0.199 0.839 


0.183 




0.223 


500 


-0.070 


0.191 0.876 


0.174 




0.205 


1000 


-0.055 


0.187 0.901 


0.170 




0.194 
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Table 3 : Simulation Results for Xxemei 



c 


Bias 


RMSE 


Median 


mean AD 


median AD 


Two-stage 
3 -0.112 


estimation : kernel 
0.195 0.818 


' first stage (N 
0.178 


= 300) 
0.206 


3.5 


-0.097 


0.193 


0.842 


0.175 




0.207 


4 


-0.087 


0.194 


0.845 


0.177 




0.211 


4.5 


-0.078 


0.195 


0.856 


0.178 




0.216 


5 


-0.071 


0.198 


0.862 


0.180 




0.220 


5.5 


-0.071 


0.202 


0.848 


0.187 




0.230 


6 


-0.085 


0.202 


0.835 


0.186 




0.234 


Two-stage 
3 -0.100 


estimation : kernel 
0.190 0.841 


' first stage (N 
0.173 


= 500) 
0.200 


3.5 


-0.088 


0.192 


0.849 


0.175 




0.207 


4 


-0.074 


0.189 


0.873 


0.170 




0.198 


4.5 


-0.062 


0.190 


0.893 


0.172 




0.196 


5 


-0.065 


0.197 


0.872 


0.180 




0.218 


5.5 


-0.058 


0.193 


0.901 


0.175 




0.209 


6 


-0.078 


0.198 


0.858 


0.182 




0.223 


Two-stage 
3 -0.081 


estimation : kernel 
0.186 0.862 


first stage (N 
0.168 


= 1000) 
0.195 


3.5 


-0.077 


0.186 


0.867 


0.168 




0.191 


4 


-0.056 


0.189 


0.903 


0.171 




0.197 


4.5 


-0.055 


0.183 


0.908 


0.165 




0.183 


5 


-0.058 


0.188 


0.899 


0.171 




0.194 


5.5 


-0.058 


0.187 


0.902 


0.169 




0.195 


6 


-0.062 


0.190 


0.897 


0.172 




0.199 
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